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Description 

FIELD OF THE INVENTION " 

s This invention relates to a method and system for obtaining a high quality still image from multiple fields of an 

interlaced video signal i.e.. deinterlacing, in the presence of both dominant motion, such as camera zoom, rotation, 
pan, or jitter, and local motion of independently moving objects. 

BACKGROUND OF THE INVENTION 



The recent availability of video frame grabbing hardware has allowed video signals to appear on a variety of new 
platforms, such as multi-media computers and video printing systems. These new platforms have generated a desire 
in the user to view and print video signals in a different mode than was originally intended, namely as stilt images. The 
interlaced video standard, while being sufficient for displaying moving pictures at satisfactory quality, is ineffective for 

'5 displaying stills since only one half the information needed to display an image is acquired at a single time. As a result, 
video must be deinterlaced, i.e., converted to progressive video, before it can be viewed as a sequence of stills. The 
deinterlacing process, which refers to forming frames from fields, is complicated by the possibility that motion can 
change the scene contents from field to field. Relative motion between fields can be caused by movement of objects 
in the scene relative to the camera, or by camera changes such as pan, zoom and jitter, where the latter is rather 

20 common in hand-held consumer camcorders. 

The prior art addresses the problem of deinterlacing an even (or an odd) field by estimating the missing odd (or 
even) lines. A welt-known method is to merge the even and odd fields, i.e., to fill in the missing lines of the odd (even) 
field by the lines of the neighboring even (odd) field. This simple mechanism, causes spatial "judder" artifacts at those 
image regions that contain moving objects (objects that move within the time interval of two successive fields). Merging, 

25 however, provides the best spatial resolution at steady image regions. Another approach to deintertacing is to concen- 
trate on a single field only (e.g., the odd field) and. interpolate the missing lines using spatial interpolation. A simple 
interpolation technique is vertical linear interpolation where an average of the available pixel values above and below 
the missing pixel is assigned to the missing pixel. This method may cause artifacts if the missing pixel is over an edge 
whose orientation is not vertical. To overcome these artifacts, an contour-sensitive spatial interpolation method is pro- 

30 posed in M. Isnardi. "Modeling the Television Process," Technical Report No. 515, Massachusetts Institute of Technol- 
ogy, 1986. pages 161 to 163. This method attempts to find the orientation of the image gradient at the missing pixel. 
Interpolation is then performed using image values that are along this orientation in order not to "cross an edge" and 
cause artifacts. 

A method that is potentially more effective is a hybrid method where the deinterlacing process switches, on a pixel- 

35 by-pixel basis, between merging and spatial interpolation depending on the dynamics of the missing pixel, so that the 
advantages of merging in steady regions are fully maintained. A motion detection scheme should be used to classify 
the missing pixel as a "moving pixel" or "steady pixel". 

In US 4.472.732. issued Sept. 16, 1 984, Bennett et al. disclose such a method that uses the pixel-by-pixel difference 
of neighboring fields with the same polarity (e.g., even fields) that follow and precede the field that will be deinterlaced 

40 (e.g., an odd field) to perform motion detection, and then switch between merging and vertical interpolation depending 
on the presence and absence of motion that is determined by thresholding the difference values. This particular ap- 
proach may falsely detect "no motion" if the scene is such that the gray levels of the pixels being compared in the two 
neighboring fields are similar although there is motion in the scene. Such a situation may happen, for instance, in case 
of scenes that contain a small object 10 moving against a uniform background 12 in the direction of arrow A as shown 

4 $ in Fig. 1., where fields (k), (k+1), and (k+2) represent successive interlaced video fields. In this case, merging of the 
fields (k) and (k+1) at a region of interest denoted as the box 14, will result in artifacts due to a false classification of 
no motion between field (k) and (k+2). If a consecutive fourth field , field (k+3) in Fig. 1 , is used in motion detection, a 
comparison of fields (k+1 ) and (k+3). in addition to the comparison of fields at times (k) and (k+2), may increase the, 
reliability of motion detection. This is evident in the example shown in Fig. T, where a "moving" decision can be rendered 

50 for the region of interest in the frame at time (k+1 ) as a result of comparing the corresponding image values at fields 
at times (k+1 ) and (k+3). Motion-detection based deinterlacing techniques that utilize four consecutive fields, and switch 
between spatial interpolation and merging, have been discussed in US 4,785.351 , issued to Ishikawa, Nov. 15, 1988, 
and in US 5,021 .870 issued to Motoe et al, June 4, 1991 .. 

These techniques that adapt themselves to the presence of motion are not effective in producing a high quality 

55 still image in case of video images that contain dominant motion between fields. In such cases, the above techniques 
will default to spatial interpolation only, and thus no additional improvement in resolution will be obtained. Video images 
with dominant motion result, for example, from the motion of hand-held cameras and/or cameras that are panned and 
zoomed. Since hand-held video cameras are becoming increasingly common in consumer applications, there is a 
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growing interest in a deinterlacing method (e a to be u«h i„ . 

.he resolution via motion conv^ed^^^^^™ 1 ^ P™'* from video) tha, improves 

• A ™.ion«ompensa«eddein,er.acig1^^ * nei9hborin 9 fields 

Wang e. al. in US 5.1 34.480. issued July 2? ,99 ^ Thf techZ! 1 T b6 ' Ween ,ields is discu «ed by 
that performs mo.»n estimation and clpen^fon Jn a S bL^T *l ^ " 3 ' * 3 ™>nod 
.ime-recursive nature of the method, a JZ^Z££!Z£L XS ° Ue ,0 ,he 

is utilized. A quad-tree hierarchy is used in adjusting the S size " i."'""* *" in,erest 
estimation. Deinterlacing is implemented by linearlv blendi™ ,h » ?, 6386 ' * a ° CUraCy 0f ,ocal mo,ion 

compensated interpotetton, where nwS^S^S^T^ VertiCa ' inter P° ,alio " ** ™«on 

interest or the recurs.ely dein.erlaced J^ZTjeZs Z * " ** *" * he fie,d °' 

objects tha. move independent ol each omTlL^nn^ k P V m ° ,i0n Vec,ors al Varies of 

motion, thus allowing for Mm^^^STSi ZZTT * ^ °' m * 0 * eS ' imatS ,ocal 
verydifftculttoso^inarobusLannerw^an^ 

high-quality stills from video, since small aSLn bl ZT P^ess.ng, and is especially difficult in creating 
using models such as affine or pers™^^^^ 

from the model Perspective, the challenge is not to produce artifacts when the actual motion deviates 
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-J^zssr.; srss **" "*» « ^ 

and performing spatial interpolation on the first f B 7d in ? 9 . ^ SeC °" d fields in areas °' "° «*>*>" 

are proved by removing L^^^^^^JS' the system and method 

The system and method of this invention ™, S P "°' t0 tocal mo,ion Election, 

producing a robust method h^IZKSZJ iTaZ^ attemi0n iS dl>eCted ,owards 

imized. We refer .0 the motion field estimate Tall^Tt* '°° estlma,lon ^Pensa.ion failures are min- 
method and system of the presenTinvemTon VSlS™ ? 1 °' 8 9l ° ba ' m0de ' as 1,16 d0 ™ ant ™«™- The 

and by reference .0 the accompanying?.^ 
BRIEF DESCRIPTION OF T HE DRAWINGS 

Figure , shows the prior art methods of detecting motion between successive fieids ,n a video image signa,: 
Figure 2 is a block diagram illustrating the major steps of the invention: 

F,gure 3 is a schematic diagram depicting a system suitable for imp.ementing the present invention: 

Figure 4 is a detailed blccK digram showing the dominant compensated filtering emp,oyed m the present 

Figure 6 shows a detailed blocK digram of adaptive motion detection as emp toye d ,n the present Mention. 
-^^S.T^^^' nUmefa,S ^ ^ ^ ?"» » designate identical 
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n »,n cn ncsrRIPTION THE INVENTION 

denoted by r«p^.vely. "^^JJ. 1 » J^, de 9 lecting loca) motion in the fields using the firs. 
; , 2 . to produce motion c ^P en ^ , ne first field and the second motion compen- 

rs r ^rr=x^sp^;i^ on - - « « — — * 

mc-on remova, stage ,S, 'ominantmot ™ is ^^Z^SX 2 2.2 
o (RO.). to produce the iWd. f,«and Although me descnb d m ^ 

,rom video, user selection of a ROI lsno.arequ.remen.lha "J^ 0 ^, tha( it is po^le to choose the 

The ROI may be chosen to represent a region in ^ where wheremotion 
RO. as the entire Held, at the cost of increased computation. The ^^^^^ the dominant. motion 
defection is performed to detect regions of video moving wth ^^^^^ in lne se nse that me 



Remove Dominant Motion 



This stage begins by ,King the input „el d s and ^ 
. F 2 l. using spatial vertical linear interpolate. This ,s d ep,c.ed a ^J^^? ^ Fq l, F ^ Note I na . we do no. 
Jotion is then es.ima.ed by mo.ion es.ima.ors 46 betwe n meframe^ rs£ £ . and ^ ^ ^ ^ 

35 relating the frames as 

F 1 t {xy)=F 0 L (x + cl+c2x+c3y.y+c4+c5x + c6y) : < 1 > 
and ,e mo.ion es.ima.ion quires e« g the Parameters 

use the method discussed in Bergen ®V ^ 4^0^88^396 ^M^JrTch ertorce^a continuousoptical flow constraint. 
IEEE Trans. Pattern Anal. Mach. Intel., Vol. 1 4. P p ,886J396 1992_ wn c ^ s ^ e v|(Jeo si , f(x , y , t) . ,„ 

andsolvesa^ 

ITSS'. »JcT b 5 A a 3 t3 ^Tp^r, , be effec.ive. Addi.fcnally. because the Taylor series expanse 
assumes a small mo.ion. .he al gorithrr , « ; used i.era.rvely. correspondjng to m0 .ion between F„t and F, 1 -, 

Given the mo.ion parameter sets denoted b \^^ ^ ^ to oo^puXe the motion compensated fields 
and F 0 <- and F 2 <-. respectively. mot,on traiectory filtenng 48 is used to ' £ w F w _ m o( 

and fcV Frames. F/and F 2 ^ are warped ^-^^^ 

which are estimates of the frame F 0 . Frames F, "and ^ are J^P^J 0 ^ even fields ,,mc 

which is the motion compensated estimate of F 0 . Finally the frame /-„ sp 
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Detect Local Motion 
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scheme. For example, if we denote the SAD value (or the missing pixel location at {m .n2-1 j at time index kas S A (m 
n2-1 ), then the SAD computed over a 3x3 window is given by ' 

1 I 

S Jt (rtl,n2-l)= £ 2l/o WC («l + ml,«2 + 2m2,Jt)-/ 2 WC (nI + ml f /t2 + 2m2^+2)| 

(2) 

where 

15 fo MC (nUn2,k) and f 2 MC (nln2,k) 

represent the corresponding continuous fields sampled over a progressive lattice. If S^m ,n2-l ) is larger than a thresh- 
old, then motion is detected and a "1" is entered to the corresponding location of the motion detection map array. 

20 Otherwise, motion is decided to be not present and a "0" is entered at the corresponding location in the motion detection 
map. In the following, we describe the method for selecting a dynamic threshold that is spatially adaptive. 

To begin, we note that as a result of the motion compensation step, the dominant motion has already been removed, 
and motion detection applied between the fields f 0 ar\d f 2 MC will be detecting the motion in objects moving independently 
from the dominant motion. Because of the global motion assumption made during filtering along the motion trajectory 

25 in the motion compensation stage, there will actually be a motion "ghost" present in the field f 2 MC . As shown in Fig. 5 : 
the motion ghost 56 of the small object 10 appears in fields /, wc and f 2 MC . The ghosting has the positive effect of 
allowing a 3-field method to detect the motion of fine-detailed objects independently moving against a uniform back- 
ground, while only comparing a single pair of fields, since the ghost will be present for motion detection. However, the 
intensity difference between the background and the ghost is now reduced by a factor of two, due to averaging, com- 

30 pared to the intensity difference between the object itself and the background. Thus, a small threshold is required for 
detection of motion. The problem now is that if the comparison threshold is set too low motion will be falsely detected 
around sharp edges that have undergone global motion but not perfectly compensated for due to inaccuracies in the 
dominant-motion estimation. To address this problem, we use adaptive thresholding, as explained below. 

The following two observations constrain the design of an adaptive thresholding scheme. First, the motion detection 

35 errors will give rise to artifacts, and these will be most noticeable in image regions of low local variance, especially 
when no motion is detected. Therefore, in regions of low variance, the comparison threshold used in motion detection 
should be low, especially for accurate detection of ghosts. Second, in regions of high variance, artifacts due to motion 
detection will be less noticeable. Additionally, when compared to deinterlacing via an intrafield technique, it is in the 
regions of higher variance that motion compensated filtering will produce the greatest increase in resolution, and the 

4 ° greatest reduction in aliasing. Therefore, in regions of higher variance the motion detection threshold should be high. 
This will also provide an implicit tolerance to errors in dominant-motion estimation at such regions, the benefits of such 
tolerance have been discussed above. The proposed motion detection scheme depicted in Fig. 6 has these properties. 

In reference to Fig. 6, the first step in the motion detection is to locally estimate 58 a sample standard deviation 
of field / 0 at every pixel over a 3x3 region, where the pixels fie at the centers of the; 3x3 windows. The result is a 2-D 

J 5 array, i.e., an image of standard deviation values (S.D. image). To spatially adapt the threshold to the S.D. image, but 
remove dependency on abnormally high values for standard deviation, the S.D. image is dynamically quantized 60 
into four levels using for example the well known k-means clustering algorithm with k=4 (see J.S. Lim, Two-Dimensional 
Signal and Image Processing, Prentice Hall, 1 990, page 607). The four-level standard deviation image output from the 
dynamic quantization process is then normalized to the interval [0,1], and used to scale a user specified maximum 

so threshold 62. This results in four thresholds that adapt to regions of lower and higher variance. Each pixel location is 
assigned to one of these four thresholds. The regions of the largest quantized standard deviation use the specified 
maximum threshold, and as the standard deviation decreases in the remaining three quantization levels so does the 
threshold. These thresholds are then applied 64 to the image of SAD values computed 66 between f 0 and f 2 MC , to 
produce the output motion detection map. We have obtained very good performance when we set the maximum thresh- 
es old to 20. in case of numerous test images. 
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Merge/Interpolate 

A 

This stage of the present invention creates the final deinterlaced image F 0 , on the basis of the motion detection 
map. For the even lines, the tines of f 0 are copied to form the even lines of F 0 . For the odd lines of F 0 , the motion map 
s from the motion detection stage is used to decide whether to replace the missing pixels in f Q by merging pixels from 
f A MC (compensated pixels), or by using intrafield interpolation (non-compensated pixels). The intrafield interpolation 
algorithm we propose is an extension of Isnardi's directional interpolation method, where we have incorporated a 
mechanism for reducing artifacts at areas where contours are not correctly identified. 

A straightforward method of detecting the direction of a contour is to compare the SAD over windows which pivot 
10 about the pixel to be interpolated. For example, a 3 pixels by one (3x1) line window would produce a SAO for the 
direction m denoted by S^, as 



15 S m = ^|f p (nl + m + l I n2-l,k)-f p (n1-m + l t n2 + l ( k)|. (3) 

The values of S m are computed for some predetermined range of slopes m=~a,....,a, and the direction mover which 

20 s m is a minimum is the slope chosen for directional interpolation. To test for fractional slopes, the image lines can be 
upsampled in the horizontal direction, and the algorithm can be applied to the upsampled image. 

When the slope m is not correctly estimated, artifacts will result. These artifacts occur in regions of fine detail where 
a well defined contour does not exist. This occurs because the SAD values for each test window pair are resulting in 
values that are random. As such, the directional choice becomes random. A remedy for this= problem is to detect the 

25 presence of a well-defined contour, and only use directional interpolation when such a contour is found. A simple but 
effective means of performing this task is to scale the SAD value for the vertical direction, S 0 . by a factor between 0 
and 1 . so that the decision is biased towards using vertical interpolation. Then, if there is well-defined contour, vertical 
interpolation is used, but if there is a contour, directional interpolation is used. It has been experimentally determined 
that the multiplying factor of 0.6 yields a good compromise between detecting the presence of a well defined contour, 

30 and providing a satisfactory estimate of the contour direction. We have obtained excellent results with numerous images 
when SAD is computed over 7x3 windows, and m=2 with slopes computed at 0.5 pixel intervals. 

The present invention can be applied to muttispectral (i.e., color) video as follows. Given the three-channel color 
video, represented by red. green and blue (R.G.B) channels, a transformation is applied to transform it to a luminance- 
chrominance space, such as (Y.U.V), where Y denotes the luminance and U and V denote the associated chroma 

35 channels. Dominant motion estimation is performed on the luminance channel. The estimated motion is then used in 
processing red, green and blue channels, separately, according to the invention, to generate the resulting high-quality, 
color still image. 

The invention has been described with reference to a preferred embodiment. However, it will be appreciated that 
variations and modifications can be effected by a person of ordinary skill in the art without departing from the scope 
-w of the invention. 



Claims 

In a method for creating a high quality still image from a series of interlaced video fields of the type employing local 
motion detection between first and third fields of the series, merging the first and second fields in areas of no 
motion and performing spatial interpolation on the first field in areas containing local motion, the improvement 
comprising: removing dominant motion from the second and third fields prior to local motion detection. 

The improved method claimed in claim 1 , wherein the spatial interpolation is contour sensitive spatial interpolation. 

The improved method claimed in claim 1, wherein the local motion detection employs an adaptive dynamic thresh- 
old. 

The improved method claimed in claim 1 , wherein the method is applied to a designated region of interest in an 
image. 

A method for creating a high quality still image from a series of three successive interlaced video fields, comprising 
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the steps of: 
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a. interpolating each* field in the series to a full frame to produce a series of interpolated fields 

b. est.mat.ng the dominant motion of. the scene between the first interpolated field and the second and third 
interpolated fields using a global motion model: 

c. compensating for the estimated dominant motion in the second and third interpolated fields by waroino the 
second and th.rd interpolated fields to align them with the first interpolated field- 

pen a 3XoSie,r * ^ *** * " ™™ c ^ 

^STXlnraX^ C ° mpenSated in,6r P° laIed field 10 and .hird averaged dominant 

field'and 9 ^ *" ** lhe ^ W aVera98d d0minant mo,ion com Pensated 
g. merging the first field and the second averaged dominant motion compensated field in regions of ihe imaqe 
that are free of local mot.cn, and performing contour sensitive spatial interpolation on the first field in areas 
containing local motion. 

6. The method for creating a high quality still image claimed in claim 5, further comprising the steps of: 

a. designating a region of interest in the image for performing the method: and 

b. performing the method steps on the designated region of- interest. . 

7. A system for creating a high quality still image from a series of three interlaced video fields, comprising: 

a. means for removing dominant motion from the second and third fields to produce dominant motion com- 
pensated second and third fields: 

b. means for detecting local motion betweenthe first field and the third dominant motion compensated field and 

c. means for merging the first field and the second dominant motion compensated field in areas of no motion 
and performing spatial interpolation on the first field in areas containing local motion. 

8. The system claimed in claim 7, wherein the means for performing spatial interpolation employs contour sensitive 
spatial interpolation. 

9. The system claimed in claim 7. wherein the means for local motion detection employs an adaptive dynamic thresh- 

1 0. A system for creating a high quality still image from a series of three successive interlaced video fields, comprising: 

a. means for interpolating each field in the series to a full frame to produced a series of interpolated fields 
b means for est.mating the dominant motion of the scene between the first interpolated field and the second 
and third interpolated fields using a global motion model: 

c. means for compensating for the estimated dominant motion in the second and third interpolated fields by 
warping the second and third interpolated fields to align them with the first interpolated field 

d. means for averaging the second and third warped interpolated fields to provide an averaged dominant 
motion compensated interpolated field: 

e. means for splitting the averaged motion compensated interpolated field to form second and third averaaed 
dominant motion compensated fields; 

f . detecting local motion between objects in the first field and the third averaged dominant motion compensated 

field: and = 

g means for merging the first field and the second averaged dominant motion compensated field in regions 
of the image that are free of local motion, and performing contour sensitive spatial interpolation on the first 
field in areas containing local motion to produce the high quality still image. 
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